Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs by singalsu · Pull Request #10750 · thesofproject/sof

singalsu · 2026-05-07T14:25:48Z

This PR contains more updates for MFCC.

The Mel audio features accuracy is improved with 32 bit Q9.23 format. The previous 16 bit Q9.7 had very little signal bits for log10 Mel format with normalization. The values were quite small in -1 to +1 range.
Fixes for many issues
Add of topology for SDW devices to run MFCC in a branched microphone capture pipeline. An example of topology is shown below with MFCC for both headset microphone and notebook device microphone.

Copilot

Pull request overview

This PR updates the MFCC feature extraction path to improve Mel log precision (moving key Mel outputs to 32-bit Q9.23) and adds SoundWire (SDW) topology support for branched “audio features capture” pipelines (MFCC/Mel output alongside normal capture).

Changes:

Switch Mel filterbank 32-bit output to Q9.23 (int32) and propagate this through MFCC processing and tuning utilities.
Add new topology2 pipeline/class and SDW platform includes to expose MFCC/Mel “audio features capture” PCMs for jack and DMIC.
Refactor MFCC tune scripts (run script + MATLAB/Octave decoders) to handle multiple bit depths and Xtensa runs.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tools/topology/topology2/platform/intel/sdw-jack-generic.conf	Inserts a module-copier stage in the jack capture path to act as a branch point for audio-feature capture.
tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf	New SDW jack MFCC/Mel capture PCM and routes into the new SRC→MFCC pipeline.
tools/topology/topology2/platform/intel/sdw-dmic-audio-feature.conf	New SDW DMIC MFCC/Mel capture PCM and routes into the new SRC→MFCC pipeline.
tools/topology/topology2/include/pipelines/cavs/host-gateway-src-mfcc-capture.conf	New reusable pipeline class intended to perform SRC then MFCC then host capture.
tools/topology/topology2/include/common/common_definitions.conf	Adds feature flags to gate SDW jack/DMIC audio-feature capture includes.
tools/topology/topology2/development/tplg-targets.cmake	Adds new SDW topology build targets enabling MFCC audio-feature capture.
tools/topology/topology2/cavs-sdw.conf	Includes the new pipeline class and gates inclusion of new SDW audio-feature capture platform snippets.
test/cmocka/src/math/auditory/auditory.c	Updates unit test to accommodate 32-bit Mel log output and compares against legacy reference after downscaling.
src/math/auditory/mel_filterbank_32.c	Changes psy_apply_mel_filterbank_32() output from int16 Q9.7 to int32 Q9.23.
src/include/sof/math/fft.h	Adds icomplex16 include (header dependency fix).
src/include/sof/math/auditory.h	Updates psy_apply_mel_filterbank_32() signature to int32 output.
src/include/sof/audio/mfcc/mfcc_comp.h	Forces MFCC to 32-bit FFT path and extends state for 32-bit Mel log storage/output pointers.
src/audio/mfcc/tune/run_mfcc.sh	Refactors MFCC tuning runner into reusable functions and adds optional Xtensa testbench execution.
src/audio/mfcc/tune/README.txt	Updates tuning documentation to match new output files and decode workflow.
src/audio/mfcc/tune/decode_mel.m	Extends Mel decoder to support s16/s24/s32 formats and raw/wav reading.
src/audio/mfcc/tune/decode_all.m	New helper to decode/plot all generated MFCC/Mel outputs in one go.
src/audio/mfcc/mfcc.c	Simplifies prepare logging; removes a sink buffer size check.
src/audio/mfcc/mfcc_setup.c	Adjusts setup behavior for sample rate mismatch; adds scratch allocation for 32-bit Mel log output and updates free paths.
src/audio/mfcc/mfcc_hifi4.c	Removes duplicate fft-fill implementation; adjusts windowing and S24 input conversion handling.
src/audio/mfcc/mfcc_hifi3.c	Removes duplicate fft-fill implementation; adjusts windowing and S24 input conversion handling.
src/audio/mfcc/mfcc_generic.c	Removes duplicate fft-fill implementation.
src/audio/mfcc/mfcc_common.c	Implements shared fft-fill routine; updates Mel processing to use/maintain Q9.23 and updates s24/s32 Mel-only output behavior.
src/audio/mfcc/Kconfig	Switches MFCC to select 32-bit Mel filterbank support.
scripts/rebuild-testbench.sh	Exports XTENSA_PATH in generated Xtensa environment setup script.

Comments suppressed due to low confidence (1)

src/audio/mfcc/Kconfig:13

MFCC is now hard-coded to use 32-bit FFT (MFCC_FFT_BITS=32), but this Kconfig only selects MATH_FFT (which defaults to 16-bit FFT support) and does not select MATH_32BIT_FFT. This can lead to link/build failures when fft_execute_32() isn’t compiled. Select MATH_32BIT_FFT here (or make MFCC_FFT_BITS configurable and select the matching FFT width).

	select CORDIC_FIXED
	select MATH_32BIT_MEL_FILTERBANK
	select MATH_AUDITORY
	select MATH_DCT
	select MATH_DECIBELS
	select MATH_FFT
	select MATH_MATRIX
	select MATH_WINDOW

singalsu · 2026-05-12T12:51:01Z

+			{
+				source	src.$index.1
+				sink	mfcc.$index.1
+			}


It's a common convention in pipeline classes to leave last widget (copier) unconnected. The upper level topology can then add widgets to pipeline if need. Also the copier index seems to be the PCM ID.

Change the Mel filterbank 32-bit variant psy_apply_mel_filterbank_32() output from int16_t Q9.7 (was wrongly commented as Q8.7) to int32_t Q9.23 format for improved signal resolution. The output parameter type is changed from int16_t* to int32_t* in both the implementation and the header declaration. The auditory unit test is updated to allocate int32_t output and convert Q9.23 to Q9.7 for comparison against existing reference vectors. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The input samples must be shifted logically to sign bit and then shifted right arithmetically into place for the 16 bit saturation instruction to work correctly. This fixes a possible overflow with large input. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Remove the duplicate AE_MULFP32X16X2RS_H call in the 32-bit FFT path of mfcc_apply_window(). Its result was immediately overwritten by the AE_MULFP32X16X2RS_L call on the next line, making it dead code. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

This patch switches MFCC_FFT_BITS from 16 to 32 to use 32-bit FFT mode for better precision in the MFCC processing pipeline. In cepstral mode (num_ceps > 0), the 32-bit Q9.23 Mel output from psy_apply_mel_filterbank_32() is converted to 16-bit Q9.7 before the existing 16-bit DCT calculation, preserving the current DCT and cepstral lifter behavior. In Mel-only mode, output format depends on sink format: - s16: Q9.7 (current format, backwards compatible) - s24: Q9.15 (one int32_t per Mel value) - s32: Q9.23 (full precision, one int32_t per Mel value) The mel_log_32 scratch buffer is placed after power_spectra in the fft_buf scratch area. A bounds check is added in mfcc_setup() to fail if num_mel_bins exceeds the available scratch space. The decode_mel.m Octave script is updated with s24 and s32 format support for the changed output encoding. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

When MFCC_FFT_BITS is 32, the HiFi3/4 mfcc_fill_fft_buffer() used AE_S16_0_XP to write 16-bit samples into 32-bit icomplex32 containers. This left the upper 16 bits of .real with stale data and .imag unzeroed, causing corrupted FFT input after the first frame when scratch buffers are reused for power_spectra and mel_log_32. Replace all platform-specific implementations with a single generic C version in mfcc_common.c. The function performs only data copying with no arithmetic, so HiFi intrinsics provide very little benefit. The new implementation uses conditional pointer types (int16_t for 16-bit FFT, int32_t for 32-bit FFT) with matching element stride, and relies on the caller's bzero of fft_buf to keep imaginary parts zero. Add missing icomplex16.h include to fft.h. The header uses struct icomplex16 in struct fft_plan but did not include its definition. After psy_apply_mel_filterbank_16() writes Q9.7 int16_t values to mel_spectra->data, convert to Q9.23 in mel_log_32 so that all downstream processing (dynamic mmax, clamping, scaling, DCT) works correctly in 16-bit FFT mode. Fix mel_log_32 scratch space check to use fft_buffer_size instead of assuming sizeof(icomplex32) per element, which overestimated available space by 2x in 16-bit mode. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

In 32-bit FFT mode the input data is 16-bit stored in the lower half of a 32-bit icomplex32 container. The AE_MULFP32X16X2RS_L intrinsic performs a Q1.31 x Q1.15 fractional multiply, so the 16-bit sample must first be shifted left by 16 to Q1.31 format. Without this shift the multiply treats the value as having 16 zero fractional bits, producing near-zero windowed output and a corrupt FFT result. Add the missing AE_SLAI32S(sample, 16) before the multiply in both HiFi3 and HiFi4 mfcc_apply_window() 32-bit paths, matching the generic C implementation. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Add missing cleanup for fft_plan. After mod_fft_plan_new() succeeds, failures in window setup and mel filterbank initialization jumped to free_fft_out, leaking the fft_plan. Add free_fft_plan label and route these error paths through it. Add missing cleanup for lifter.matrix. Late validation checks (mel_log_32 space, output capacity) jumped to free_dct_matrix, skipping the lifter matrix that may have been allocated. Add free_lifter label for these paths. Replace rfree() with mod_free() in all error cleanup labels to match the mod_zalloc() allocations and the existing mfcc_free_buffers() implementation. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Refactor run_mfcc.sh into functions for input conversion and testbench execution to reduce code duplication. Add Xtensa testbench support when XTENSA_PATH environment variable is set, producing xt_ prefixed output files. Add decode_all.m Octave script to decode and plot all MFCC cepstral and Mel spectrogram output files from run_mfcc.sh, including Xtensa variants. Update README.txt to document the current run_mfcc.sh output files, Xtensa support, and decode_all.m usage. Export XTENSA_PATH in rebuild-testbench.sh so that run_mfcc.sh can find the Xtensa toolchain path for the testbench build. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The checks previously done in prepare() are done in the module adapter. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

The module copier allows to branch the capture pipeline for different processing. In this patch series the module-copier is added to be able to run audio features extraction from the shared headset microphone endpoint. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Add a new host-gateway-src-mfcc-capture pipeline class that chains SRC (48 kHz to 16 kHz) with the MFCC component for audio features extraction. Two new platform configuration files are added: - sdw-jack-audio-feature.conf: taps the SoundWire jack capture path (module-copier 11) into an SRC+MFCC pipeline (pipeline 130, PCM 47) - sdw-dmic-audio-feature.conf: taps the SoundWire DMIC capture path (module-copier 41) into an SRC+MFCC pipeline (pipeline 131, PCM 48) Both are gated by new IncludeByKey defines SDW_JACK_AUDIO_FEATURE_CAPTURE and SDW_DMIC_AUDIO_FEATURE_CAPTURE (default false) in cavs-sdw.conf. Development topology targets are added for MTL rt713 and ARL cs42l43+cs35l56 configurations with MFCC features capture enabled. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

src/audio/mfcc/Kconfig:13

COMP_MFCC now hard-codes 32-bit MFCC processing (MFCC_FFT_BITS=32) and the code calls fft_execute_32(), but this Kconfig only selects MATH_FFT (which defaults to 16-bit support) and does not select MATH_32BIT_FFT. With default Kconfig values this can lead to missing 32-bit FFT objects at link time. Please select MATH_32BIT_FFT here (or make MFCC_FFT_BITS conditional on CONFIG_MATH_32BIT_FFT).

config COMP_MFCC
	tristate "MFCC component"
	depends on COMP_MODULE_ADAPTER
	select CORDIC_FIXED
	select MATH_32BIT_MEL_FILTERBANK
	select MATH_AUDITORY
	select MATH_DCT
	select MATH_DECIBELS
	select MATH_FFT
	select MATH_MATRIX
	select MATH_WINDOW

 	/* get sink data format and period bytes */
 	sink_format = audio_stream_get_frm_fmt(&sinkb->stream);
-	sink_period_bytes = audio_stream_period_bytes(&sinkb->stream, dev->frames);
-	comp_info(dev, "source_format = %d, sink_format = %d",
-		  source_format, sink_format);
-	if (audio_stream_get_size(&sinkb->stream) < sink_period_bytes) {
-		comp_err(dev, "sink buffer size %d is insufficient < %d",
-			 audio_stream_get_size(&sinkb->stream), sink_period_bytes);
-		ret = -ENOMEM;
-		goto err;
-	}
+	comp_info(dev, "source_format = %d, sink_format = %d", source_format, sink_format);

 	cd->config = comp_get_data_blob(cd->model_handler, &data_size, NULL);


+	"true" "platform/intel/sdw-jack-audio-feature.conf"
+}
+
+IncludeByKey.SDW_DMIC_AUDIO_FEATURE_CAPTURE {
+	"true" "platform/intel/sdw-dmic-audio-feature.conf"


+	SDW_JACK_AUDIO_FEATURE_CAPTURE_PCM_NAME "Jack In Audio Features"
+	SDW_JACK_AUDIO_FEATURE_CAPTURE_PCM_ID 47
+	SDW_JACK_AUDIO_FEATURE_CAPTURE_STREAM_NAME "Jack In Audio Features Stream"
+	SDW_JACK_AUDIO_FEATURE_CAPTURE_PIPELINE_ID 130


+	SDW_DMIC_AUDIO_FEATURE_CAPTURE_PCM_NAME "Microphone Audio Features"
+	SDW_DMIC_AUDIO_FEATURE_CAPTURE_PCM_ID 48
+	SDW_DMIC_AUDIO_FEATURE_CAPTURE_STREAM_NAME "Microphone Audio Features Stream"
+	SDW_DMIC_AUDIO_FEATURE_CAPTURE_PIPELINE_ID 131


+			Object.Base.input_audio_format [
+				{
+					in_bit_depth		32
+					in_valid_bit_depth	32
+					in_rate			48000
+				}
+			]
+			Object.Base.output_audio_format [
+				{
+					out_bit_depth		32
+					out_valid_bit_depth	32
+					out_rate		16000
+				}



-		/* Convert powerspectrum to Mel band logarithmic spectrum */
-		mat_init_16b(state->mel_spectra, 1, state->dct.num_in, 7); /* Q8.7 */
+		/* Convert powerspectrum to Mel band logarithmic spectrum Q9.23 */


+	 *  +-------------------------------------+------------------+
+	 *  | 3. power_spectra[],                 | 6. mel_log_32[], |
+	 *  |    32 bits, e.g. x257 -> 1028 bytes |    32b, 92 bytes |
 	 *  +-------------------------------------+------------------+


singalsu mentioned this pull request May 7, 2026

Audio: MFCC: Add Mel spectrogram mode via configuration blob #10743

Merged

singalsu force-pushed the mfcc_use_32bit_fft_mel branch from 40bb97f to 1768663 Compare May 12, 2026 11:44

singalsu changed the title ~~Audio: MFCC: Use 32 bit FFT and Mel frequency scale filters for better precision~~ Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs May 12, 2026

singalsu marked this pull request as ready for review May 12, 2026 11:55

singalsu requested a review from ranj063 as a code owner May 12, 2026 11:55

Copilot AI review requested due to automatic review settings May 12, 2026 11:55

singalsu requested review from dbaluta, jsarha, kv2019i, lbetlej, lgirdwood, mmaka1 and plbossart as code owners May 12, 2026 11:55

Copilot started reviewing on behalf of singalsu May 12, 2026 11:55 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

singalsu added 11 commits May 12, 2026 15:22

Audio: MFCC: Remove obsolete check for sink size

1315ef5

The checks previously done in prepare() are done in the module adapter. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

singalsu force-pushed the mfcc_use_32bit_fft_mel branch from 1768663 to d64404a Compare May 12, 2026 13:08

singalsu requested a review from Copilot May 12, 2026 13:09

Copilot started reviewing on behalf of singalsu May 12, 2026 13:11 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs#10750

Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs#10750
singalsu wants to merge 11 commits into
thesofproject:mainfrom
singalsu:mfcc_use_32bit_fft_mel

singalsu commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

singalsu May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

singalsu commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

singalsu May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

singalsu commented May 7, 2026 •

edited

Loading